Plotting in R

Explore different types of plots in ggplot2

ggplot2 is an R package for creating graphics based on The Grammar of Graphics1. ADD Brett’s blurb (Data + Mapping-layer,scales,coord, facet, theme )

Required packages

Make sure that you have the following packages installed: palmerpenguin tidyverse ggpubr ADD REST

Dataset

We will use the palmerpenguins dataset. Data were collected and made available by Dr. Kristen Gorman and the Palmer Station, Antarctica LTER, a member of the Long Term Ecological Research Network.

We will briefly check the datastructure before we start plotting.

library(palmerpenguins)  # load palmerpenguins dataset
head(penguins)  #check table structure
# A tibble: 6 x 8
  species island bill_length_mm bill_depth_mm flipper_length_… body_mass_g sex  
  <fct>   <fct>           <dbl>         <dbl>            <int>       <int> <fct>
1 Adelie  Torge…           39.1          18.7              181        3750 male 
2 Adelie  Torge…           39.5          17.4              186        3800 fema…
3 Adelie  Torge…           40.3          18                195        3250 fema…
4 Adelie  Torge…           NA            NA                 NA          NA <NA> 
5 Adelie  Torge…           36.7          19.3              193        3450 fema…
6 Adelie  Torge…           39.3          20.6              190        3650 male 
# … with 1 more variable: year <int>
summary(penguins)  #summarize data
      species          island    bill_length_mm  bill_depth_mm  
 Adelie   :152   Biscoe   :168   Min.   :32.10   Min.   :13.10  
 Chinstrap: 68   Dream    :124   1st Qu.:39.23   1st Qu.:15.60  
 Gentoo   :124   Torgersen: 52   Median :44.45   Median :17.30  
                                 Mean   :43.92   Mean   :17.15  
                                 3rd Qu.:48.50   3rd Qu.:18.70  
                                 Max.   :59.60   Max.   :21.50  
                                 NA's   :2       NA's   :2      
 flipper_length_mm  body_mass_g       sex           year     
 Min.   :172.0     Min.   :2700   female:165   Min.   :2007  
 1st Qu.:190.0     1st Qu.:3550   male  :168   1st Qu.:2007  
 Median :197.0     Median :4050   NA's  : 11   Median :2008  
 Mean   :200.9     Mean   :4202                Mean   :2008  
 3rd Qu.:213.0     3rd Qu.:4750                3rd Qu.:2009  
 Max.   :231.0     Max.   :6300                Max.   :2009  
 NA's   :2         NA's   :2                                 

As we can see from the summary table three different species of penguins were recorded in three different islands. alt penguins

Scatterplot

Let’s explore if there is a correlation between the body mass of the penguins and the flipper length

library(tidyverse)  # load the tidyverse package; contains ggplot2

ggplot(penguins, aes(x = flipper_length_mm, y = body_mass_g)) + geom_point()

Let’s add a trendline

ggplot(penguins, aes(x = flipper_length_mm, y = body_mass_g)) + geom_point() + geom_smooth(method = "lm")

Let’s add a trendline together with the equation

library(ggpubr)  #package the facilitates the display of the equation

ggplot(penguins, aes(x = flipper_length_mm, y = body_mass_g)) + geom_point() + geom_smooth(method = "lm") +
    stat_regline_equation(label.y = 6000, aes(label = ..eq.label..)) + stat_regline_equation(label.y = 5600,
    aes(label = ..rr.label..))

Are there any differences between the species?

# regression equations will overlap, we will use faceting for them
ggplot(penguins, aes(x = flipper_length_mm, y = body_mass_g, color = species)) +
    geom_point() + geom_smooth(method = "lm")

ggplot(penguins, aes(x = flipper_length_mm, y = body_mass_g, color = species)) +
    geom_point(aes(shape = sex))

Change point size

ggplot(penguins, aes(x = flipper_length_mm, y = bill_length_mm, color = species)) +
    geom_point(aes(shape = sex, size = body_mass_g))

Faceting

# regression equations will overlap, we will use faceting for them
ggplot(penguins, aes(x = flipper_length_mm, y = body_mass_g, color = species)) +
    geom_point(aes(shape = sex)) + facet_wrap(~species, scales = "free_x") + geom_smooth(method = "lm",
    se = FALSE) + stat_regline_equation(label.y = 6000, aes(label = ..eq.label..)) +
    stat_regline_equation(label.y = 5800, aes(label = ..rr.label..))

Themes

There are built in ggplot themes or there is a long list of cosmetic changes you can make with theme(). Let’s try changing themes in other type of plot, histograms. Let’s plot the distribution of the flipper length for each species. We will use my favourite theme: them_bw()

https://r-charts.com/ggplot2/themes/

ggplot(penguins, aes(flipper_length_mm, fill = species)) + geom_histogram(alpha = 0.6,
    position = "identity") + theme_bw()

ggplot(penguins, aes(flipper_length_mm, fill = species)) + geom_histogram(alpha = 0.6,
    position = "identity") + theme_void()

Themes can be modified

Labels

ggplot(penguins, aes(flipper_length_mm, fill = species)) + geom_histogram(alpha = 0.6,
    position = "identity") + theme_bw() + labs(x = "Flipper length (mm)", y = "Counts") +
    theme(axis.title.x = element_text(color = "black", face = "bold", size = 14),
        axis.title.y = element_text(color = "black", face = "bold", size = 14))

Axis

# modify axis font size
ggplot(penguins, aes(flipper_length_mm, fill = species)) + geom_histogram(alpha = 0.6,
    position = "identity") + theme_bw() + labs(x = "Flipper length (mm)", y = "Counts") +
    theme(axis.text.x = element_text(color = "black", size = 12), axis.text.y = element_text(color = "black",
        size = 12), axis.title.x = element_text(color = "black", face = "bold", size = 14),
        axis.title.y = element_text(color = "black", face = "bold", size = 14))

You can also change the color, angle, and justification of the axis labels.

# modify axis font size
ggplot(penguins, aes(flipper_length_mm, fill = species)) + geom_histogram(alpha = 0.6,
    position = "identity") + theme_bw() + labs(x = "Flipper length (mm)", y = "Counts") +
    theme(axis.text.x = element_text(color = "black", size = 12, angle = 45), axis.text.y = element_text(color = "black",
        size = 12), axis.title.x = element_text(color = "grey30", face = "bold",
        size = 14), axis.title.y = element_text(color = "grey30", face = "bold",
        size = 14))

The horizontal or vertical justification, (hjust and vjust) can also be adjusted. This hjust and vjust argument can be best explained using this figure [Source from Stackoverflow]:

e.g.:

ggplot(penguins, aes(flipper_length_mm, fill = species)) + geom_histogram(alpha = 0.6,
    position = "identity") + theme_bw() + labs(x = "Flipper length (mm)", y = "Counts") +
    theme(axis.text.x = element_text(color = "black", size = 12, angle = 45, hjust = 1,
        vjust = 1), axis.text.y = element_text(color = "black", size = 12), axis.title.x = element_text(color = "grey30",
        face = "bold", size = 14), axis.title.y = element_text(color = "grey30",
        face = "bold", size = 14))

Legends

ggplot(penguins, aes(flipper_length_mm, fill = species)) + geom_histogram(alpha = 0.6,
    position = "identity") + theme_bw() + labs(x = "Flipper length (mm)", y = "Counts",
    fill = "Species") + theme(axis.text.x = element_text(color = "black", size = 12),
    axis.text.y = element_text(color = "black", size = 12), axis.title.x = element_text(color = "black",
        face = "bold", size = 14), axis.title.y = element_text(color = "black", face = "bold",
        size = 14), legend.title = element_text(color = "black", face = "bold", size = 14),
    legend.text = element_text(size = 12))

###Colors

Barplot

Boxplots

ggplot(penguins, aes(x = species, y = flipper_length_mm, fill = sex)) + geom_boxplot() +
    theme_bw() + labs(x = "Species", y = "Flipper length (mm)") + theme(axis.text.x = element_text(color = "black",
    size = 12), axis.text.y = element_text(color = "black", size = 12), axis.title.x = element_text(color = "black",
    face = "bold", size = 14), axis.title.y = element_text(color = "black", face = "bold",
    size = 14))

ggplot(na.omit(penguins), aes(x = species, y = flipper_length_mm, fill = sex)) +
    geom_boxplot() + theme_bw() + labs(x = "Species", y = "Flipper length (mm)") +
    theme(axis.text.x = element_text(color = "black", size = 12), axis.text.y = element_text(color = "black",
        size = 12), axis.title.x = element_text(color = "black", face = "bold", size = 14),
        axis.title.y = element_text(color = "black", face = "bold", size = 14))

ggplot(na.omit(penguins), aes(x = species, y = flipper_length_mm, fill = sex)) +
    geom_boxplot() + geom_jitter(color = "black", size = 0.4, alpha = 0.9) + theme_bw() +
    labs(x = "Species", y = "Flipper length (mm)") + theme(axis.text.x = element_text(color = "black",
    size = 12), axis.text.y = element_text(color = "black", size = 12), axis.title.x = element_text(color = "black",
    face = "bold", size = 14), axis.title.y = element_text(color = "black", face = "bold",
    size = 14))

Violinplots

ggplot(na.omit(penguins), aes(x = species, y = flipper_length_mm, fill = sex)) +
    geom_violin() + theme_bw() + labs(x = "Species", y = "Flipper length (mm)") +
    theme(axis.text.x = element_text(color = "black", size = 12), axis.text.y = element_text(color = "black",
        size = 12), axis.title.x = element_text(color = "black", face = "bold", size = 14),
        axis.title.y = element_text(color = "black", face = "bold", size = 14))

ggplot(na.omit(penguins), aes(x = species, y = flipper_length_mm, fill = sex)) +
    geom_violin() + geom_boxplot(position = position_dodge(width = 0.9), width = 0.2) +
    theme_bw() + labs(x = "Species", y = "Flipper length (mm)") + theme(axis.text.x = element_text(color = "black",
    size = 12), axis.text.y = element_text(color = "black", size = 12), axis.title.x = element_text(color = "black",
    face = "bold", size = 14), axis.title.y = element_text(color = "black", face = "bold",
    size = 14))

Colors

Plot interactively with plotly or dygraphs

##Dataset The following material is from XXX

Get URL to CSV

Visit the ERDDAP server https://oceanview.pfeg.noaa.gov/erddap and do a Full Text Search for Datasets using “cciea” in the text box before clicking Search. These are all the California Current IEA datasets. From the listing of datasets, click on data for the “CCIEA Anthropogenic Drivers” dataset. Note the filtering options for time and other variables like consumption_fish (Millions of metric tons) and cps_landings_coastwide (1000s metric tons). Set the time filter from being only the most recent time to the entire range of available time from 1945-01-01 to 2020-01-01. Scroll to the bottom and Submit with the default .htmlTable view. You get an web table of the data. Notice the many missing values in earlier years. Go back in your browser to change the the File type to .csv. Now instead of clicking Submit, click on Just generate the URL. Although the generated URL lists all variables to include, the default is to do that, so we can strip off everything after the .csv, starting with the query parameters ? .

Download CSV

Let’s use this URL to download a new file

# set variables
csv_url <- "https://oceanview.pfeg.noaa.gov/erddap/tabledap/cciea_AC.csv"
# if ERDDAP server down (Error in download.file) with URL above, use this:
# csv_url <-
# 'https://raw.githubusercontent.com/noaa-iea/r3-train/master/data/cciea_AC.csv'
dir_data <- "data"
# derived variables
csv <- file.path(dir_data, basename(csv_url))
# create directory
dir.create(dir_data, showWarnings = F)
# download file
if (!file.exists(csv)) download.file(csv_url, csv)

Read table read.csv()

Now open the file by going into the Files RStudio pane, More -> Show Folder in New Window. Then double click on data/cciea_AC.csv to open in your Spreadsheet program (like Microsoft Excel or Apple Pages or LibreOffice Calc).

### Read table `read.csv()`
# attempt to read csv
d <- read.csv(csv)
# show the data frame
d

Note how the presence of the 2nd line with units makes the values character <chr> data type. But we want numeric values. So we could manually delete that second line of units or look at the help documentation for this function (?read.csv in Console pane; or F1 key with cursor on the function in the code editor pane). Notice the skip argument, which we can implement like so:

# read csv by skipping first two lines, so no header
d <- read.csv(csv, skip = 2, header = FALSE)

# update data frame to original column names
names(d) <- names(read.csv(csv))

# fix year
d$time <- sub("-.+", "", d$time)

d$time <- as.integer(d$time)
# update for future reuse (NEW!)
write.csv(d, csv, row.names = F)
head(d)

Series line plot aes(color = region)

Next, let’s also show the other regional values (CA, OR and WA; not coastwide) in the plot as a series with different colors. To do this, we’ll want to tidy the data into long format so we can have a column for total_fisheries_revenue and another region column to supply as the group and color aesthetics based on aesthetics we see are available for geom_line():

d_rgn <- d %>% 
  # select columns
  select(
    time, 
    starts_with("total_fisheries_revenue")) %>% 
  # exclude column
  select(-total_fisheries_revenue_coastwide) %>% 
  # pivot longer
  pivot_longer(-time) %>% 
  # mutate region by stripping other
  mutate(
    region = name %>% 
      str_replace("total_fisheries_revenue_", "") %>% 
      str_to_upper()) %>% 
  # filter for not NA
  filter(!is.na(value)) %>% 
  # select columns
  select(time, region, value)
  
# create plot object
p_rgn <- ggplot(
  d_rgn,
  # aesthetics
  aes(x= time, y = value, group = region, color = region))+
    theme_bw()+
    labs(x = "Year", y = "Millions $")+
    theme(axis.text.x = element_text(color = "black", size = 12, hjust = 1, vjust = 1),
        axis.text.y = element_text(color = "black", size = 12),
        axis.title.x = element_text(color = "grey30", face = "bold", size = 14),
        axis.title.y = element_text(color = "grey30", face = "bold", size = 14))+
  # geometry
  geom_line()
# show plot
p_rgn

When rendering to HTML, you can render most ggplot objects interactively with plotly::ggplotly(). The plotly library is an R htmlwidget providing simple R functions to render interactive JavaScript visualizations.

library(plotly)
plotly::ggplotly(p_rgn)

Create interactive time series with dygraphs::dygraph()

Another htmlwidget plotting library written more specifically for time series data is dygraphs. Unlike the ggplot2 data input, a series is expected in wide (not tidy long) format. So we use tidyr’s pivot_wider() first.

library(dygraphs)
library(DT)
d_rgn_wide <- d_rgn %>%
    mutate(Year = time) %>%
    select(Year, region, value) %>%
    pivot_wider(names_from = region, values_from = value)
datatable(d_rgn_wide)
d_rgn_wide %>%
    dygraph() %>%
    dyRangeSelector()

  1. Leland Wilkinson. The Grammar of Graphics (Statistics and Computing) 2nd Edition↩︎